Rio Grande do Sul
Synthetic Data: AI's New Weapon Against Android Malware
Nogueira, Angelo Gaspar Diniz, Paim, Kayua Oleques, Bragança, Hendrio, Mansilha, Rodrigo Brandão, Kreutz, Diego
The ever-increasing number of Android devices and the accelerated evolution of malware, reaching over 35 million samples by 2024, highlight the critical importance of effective detection methods. Attackers are now using Artificial Intelligence to create sophisticated malware variations that can easily evade traditional detection techniques. Although machine learning has shown promise in malware classification, its success relies heavily on the availability of up-to-date, high-quality datasets. The scarcity and high cost of obtaining and labeling real malware samples presents significant challenges in developing robust detection models. In this paper, we propose MalSynGen, a Malware Synthetic Data Generation methodology that uses a conditional Generative Adversarial Network (cGAN) to generate synthetic tabular data. This data preserves the statistical properties of real-world data and improves the performance of Android malware classifiers. We evaluated the effectiveness of this approach using various datasets and metrics that assess the fidelity of the generated data, its utility in classification, and the computational efficiency of the process. Our experiments demonstrate that MalSynGen can generalize across different datasets, providing a viable solution to address the issues of obsolescence and low quality data in malware detection. With approximately 3 billion Android devices in operation worldwide [1], the mobile cybersecurity landscape faces formidable challenges. In 2024 alone, Kaspersky reported over 33.3 million cyberattacks targeting smartphone users globally, encompassing diverse forms of malware and unwanted software [2]. Adding to this problem, attackers are using Artificial Intelligence (AI) to rapidly generate new malware variants by exploiting patterns learned from existing malware [3].
Usando LLMs para Programar Jogos de Tabuleiro e Variações
Becker, Álvaro Guglielmin, Rossato, Lana Bertoldo, Tavares, Anderson Rocha
Creating programs to represent board games can be a time-consuming task. Large Language Models (LLMs) arise as appealing tools to expedite this process, given their capacity to efficiently generate code from simple contextual information. In this work, we propose a method to test how capable three LLMs (Claude, DeepSeek and ChatGPT) are at creating code for board games, as well as new variants of existing games.
Handling Missing Data in Probabilistic Regression Trees: Methods and Implementation in R
Prass, Taiane Schaedler, Neimaier, Alisson Silva, Pumi, Guilherme
Probabilistic Regression Trees (PRTrees) generalize traditional decision trees by incorporating probability functions that associate each data point with different regions of the tree, providing smooth decisions and continuous responses. This paper introduces an adaptation of PRTrees capable of handling missing values in covariates through three distinct approaches: (i) a uniform probability method, (ii) a partial observation approach, and (iii) a dimension-reduced smoothing technique. The proposed methods preserve the interpretability properties of PRTrees while extending their applicability to incomplete datasets. Simulation studies under MCAR conditions demonstrate the relative performance of each approach, including comparisons with traditional regression trees on smooth function estimation tasks. The proposed methods, together with the original version, have been developed in R with highly optimized routines and are distributed in the PRTree package, publicly available on CRAN. In this paper we also present and discuss the main functionalities of the PRTree package, providing researchers and practitioners with new tools for incomplete data analysis.
Generative AI as a catalyst for democratic Innovation: Enhancing citizen engagement in participatory budgeting
Sousa, Italo Alberto do Nascimento, Machado, Jorge, Vaz, Jose Carlos
This research examines the role of Generative Artificial Intelligence (AI) in enhancing citizen engagement in participatory budgeting. In response to challenges like declining civic participation and increased societal polarization, the study explores how online political participation can strengthen democracy and promote social equity. By integrating Generative AI into public consultation platforms, the research aims to improve citizen proposal formulation and foster effective dialogue between citizens and government. It assesses the capacities governments need to implement AI-enhanced participatory tools, considering technological dependencies and vulnerabilities. Analyzing technological structures, actors, interests, and strategies, the study contributes to understanding how technological advancements can reshape participatory institutions to better facilitate citizen involvement. Ultimately, the research highlights how Generative AI can transform participatory institutions, promoting inclusive, democratic engagement and empowering citizens.
Artificial neural networks ensemble methodology to predict significant wave height
Minuzzi, Felipe Crivellaro, Farina, Leandro
Institute of Mathematics and Statistics, Federal University of Rio Grande do Sul (UFRGS), Av. Center for Coastal and Oceanic Geology Studies (CECO), Federal University of Rio Grande do Sul (UFRGS), Av. Abstract The forecast of wave variables are important for several applications that depend on a better description of the ocean state. Due to the chaotic behaviour of the differential equations which model this problem, a well know strategy to overcome the difficulties is basically to run several simulations, by for instance, varying the initial condition, and averaging the result of each of these, creating an ensemble. Moreover, in the last few years, considering the amount of available data and the computational power increase, machine learning algorithms have been applied as surrogate to traditional numerical models, yielding comparative or better results. In this work, we present a methodology to create an ensemble of different artificial neural networks architectures, namely, MLP, RNN, LSTM, CNN and a hybrid CNN-LSTM, which aims to predict significant wave height on six different locations in the Brazilian coast. The networks are trained using NOAA's numerical reforecast data and target the residual between observational data and the numerical model output. A new strategy to create the training and target datasets is demonstrated. Introduction Numerical simulations of both weather and ocean parameters rely on the evolution of nonlinear dynamical systems that have a high sensitivity on initial conditions. Considering that errors in the observations and analysis are present, and therefore in the initial conditions, the concept of a unique deterministic solution of the governing equations becomes fragile [1, 2].
Variable selection for minimum-variance portfolios
Moura, Guilherme V., Santos, André P., Torrent, Hudson S.
Machine learning (ML) methods have been successfully employed in identifying variables that can predict the equity premium of individual stocks. In this paper, we investigate if ML can also be helpful in selecting variables relevant for optimal portfolio choice. To address this question, we parameterize minimum-variance portfolio weights as a function of a large pool of firm-level characteristics as well as their second-order and cross-product transformations, yielding a total of 4,610 predictors. We find that the gains from employing ML to select relevant predictors are substantial: minimum-variance portfolios achieve lower risk relative to sparse specifications commonly considered in the literature, especially when non-linear terms are added to the predictor space. Moreover, some of the selected predictors that help decreasing portfolio risk also increase returns, leading to minimum-variance portfolios with good performance in terms of Shape ratios in some situations. Our evidence suggests that ad-hoc sparsity can be detrimental to the performance of minimum-variance characteristics-based portfolios.